31 research outputs found

    Concept-modulated model-based offline reinforcement learning for rapid generalization

    Full text link
    The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches

    Context Meta-Reinforcement Learning via Neuromodulation

    Full text link
    Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines

    Sliced Cramer synaptic consolidation for preserving deeply learned representations

    Get PDF
    Deep neural networks suffer from the inability to preserve the learned data representation (i.e., catastrophic forgetting) in domains where the input data distribution is non-stationary, and it changes during training. Various selective synaptic plasticity approaches have been recently proposed to preserve network parameters, which are crucial for previously learned tasks while learning new tasks. We explore such selective synaptic plasticity approaches through a unifying lens of memory replay and show the close relationship between methods like Elastic Weight Consolidation (EWC) and Memory-Aware-Synapses (MAS). We then propose a fundamentally different class of preservation methods that aim at preserving the distribution of the network’s output at an arbitrary layer for previous tasks while learning a new one. We propose the sliced Cramer distance as a suitable ´ choice for such preservation and evaluate our Sliced Cramer Preservation (SCP) ´ algorithm through extensive empirical investigations on various network architectures in both supervised and unsupervised learning settings. We show that SCP consistently utilizes the learning capacity of the network better than online-EWC and MAS methods on various incremental learning tasks

    A-EMS: An Adaptive Emergency Management System for Autonomous Agents in Unforeseen Situations

    Get PDF
    International audienceReinforcement learning agents are unable to respond effectively when faced with novel, out-of-distribution events until they have undergone a significant period of additional training. For lifelong learning agents, which cannot be simply taken offline during this period, suboptimal actions may be taken that can result in unacceptable outcomes. This paper presents the Autonomous Emergency Management System (A-EMS)-an online, data-driven, emergency-response method that aims to provide autonomous agents the ability to react to unexpected situations that are very different from those it has been trained or designed to address. The proposed approach devises a customized response to the unforeseen situation sequentially, by selecting actions that minimize the rate of increase of the reconstruction error from a variational autoencoder. This optimization is achieved online in a data-efficient manner (on the order of 30 to 80 data-points) using a modified Bayesian optimization procedure. The potential of A-EMS is demonstrated through emergency situations devised in a simulated 3D car-driving application

    Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture

    Full text link
    This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN's results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards

    A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Full text link
    Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.Comment: To appear in Neural Network

    Dose-Dependent Effects of Closed-Loop tACS Delivered During Slow-Wave Oscillations on Memory Consolidation

    Get PDF
    Sleep is critically important to consolidate information learned throughout the day. Slow-wave sleep (SWS) serves to consolidate declarative memories, a process previously modulated with open-loop non-invasive electrical stimulation, though not always effectively. These failures to replicate could be explained by the fact that stimulation has only been performed in open-loop, as opposed to closed-loop where phase and frequency of the endogenous slow-wave oscillations (SWOs) are matched for optimal timing. The current study investigated the effects of closed-loop transcranial Alternating Current Stimulation (tACS) targeting SWOs during sleep on memory consolidation. 21 participants took part in a three-night, counterbalanced, randomized, single-blind, within-subjects study, investigating performance changes (correct rate and F1 score) on images in a target detection task over 24 h. During sleep, 1.5 mA closed-loop tACS was delivered in phase over electrodes at F3 and F4 and 180° out of phase over electrodes at bilateral mastoids at the frequency (range 0.5–1.2 Hz) and phase of ongoing SWOs for a duration of 5 cycles in each discrete event throughout the night. Data were analyzed in a repeated measures ANOVA framework, and results show that verum stimulation improved post-sleep performance specifically on generalized versions of images used in training at both morning and afternoon tests compared to sham, suggesting the facilitation of schematization of information, but not of rote, veridical recall. We also found a surprising inverted U-shaped dose effect of sleep tACS, which is interpreted in terms of tACS-induced faciliatory and subsequent refractory dynamics of SWO power in scalp EEG. This is the first study showing a selective modulation of long-term memory generalization using a novel closed-loop tACS approach, which holds great potential for both healthy and neuropsychiatric populations
    corecore